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Parameter estimation in ordinary diflerential equations, altliough applied and refined in various 
fields of the quantitative sciences, is still confronted with a variety of difficulties. One major challenge 
is finding the global optimum of a log-likelihood function that has several local optima, e.g. in 
oscillatory systems. In this publication, we introduce a formulation based on continuation of the 
log-likelihood function that allows to restate the parameter estimation problem as a boundary value 
problem. By construction, the ordinary differential equations are solved and the parameters are 
estimated both in one step. The formulation as a boundary value problem enables an optimal 
transfer of information given by the measurement time courses to the solution of the estimation 
problem, thus favoring convergence to the global optimum. This is demonstrated explicitly for the 
fully as well as the partially observed Lotka-Volterra system. 



Ordinary differential equation (ODE) models play a 
key role for understanding and predicting the behavior 
of dynamic systems originating from various disciplines 
like physics, chemistry or the life sciences. In many 
cases, these dynamic models depend on parameters 
that are not known beforehand but need to be deter- 
mined from measurement data by means of statistical 
methods. Inference of parameters of dynamic systems 
from measurement data is commonly realized by op- 
timization of the likelihood function. Optimization 
is a broad field and many different algorithms have 
come up over the last decades [iHS], each of them 
with problem specific advantages and disadvantages. 
One characteristic distinction between optimizers is 
whether they include stochastic! ty or not. Stochastic 
optimizers, e.g. evolutionary algorithms [S], particle 
swarms [7l |8] or simulated annealing |9] are especially 
valuable for discontinuous likelihood functions where 
gradient information is not available or not defined. On 
the other hand, many deterministic optimizers employ 
information about the differentiable structure of the 
likelihood, i.e. gradient and Hessian information. For 
differentiable likelihood functions this has the advantage 
that convergence is achieved much faster. However, this 
approach has to struggle with other difficulties. If the 
likelihood function has local optima, the outcome of the 
optimization procedure depends on the starting point. 
Once the optimizer is approaching a local optimum, the 
algorithm will not leave this optimum disregarding the 
existence of better optima. 

The problem of local optima has been addressed by 
several approaches. It has been shown that a com- 
bination of deterministic and stochastic optimization 



can help escaping local optima and finding the global 
optimum jlOj . Other approaches modify the dynamic 
system by homotopy transformations [TT] introducing a 
factor A that allows for a continuous transition between 
the modified, convex problem and the original problem. 
Hence another approach is the multiple-shooting method 
jl2j . Most optimizers follow a single-shooting approach, 
i.e. model trajectories are computed based on given 
initial values and the outcome is compared to the data. 
In contrast, the multiple-shooting approach introduces a 
grid of time-points and initial condition parameters. The 
optimizer is initialized with discontinuous trajectories 
and constraints are defined guaranteeing that all trajec- 
tories become continuous in the course of optimization. 
In our work, we present a reformulation of the opti- 
mization problem as a boundary value problem (EVP). 
The motivation for this approach is two-fold. The first 
argument follows from the history of gradient-based 
single-shooting optimization for parameter estimation in 
ordinary differential equations. The performance and 
accuracy of this method has been enormously increased 
by solving the ODE together with its' derivatives with 
respect to the parameters, i.e. the sensitivity equations, 
in one integration run. This augmentation step allows 
a fast and accurate computation of the gradient but 
still evaluation and optimization of the objective func- 
tion are separate steps. Our aim is to take the next 
logical step and incorporate even optimization into 
the ODE integration. The second argument takes up 
the multiple-shooting idea: the possibility to initialize 
EVP solvers with prior knowledge like approximate 
trajectories from measurement data. If the optimization 
problem is equivalently expressed as a boundary value 
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problem then a good initialization should increase the 
solver's ability to find the correct solution. 
In the following, we show how both objectives can be 
matched. Our augmentation of the ODE is based on 
continuation of the log-likelihood function to a differen- 
tiable function of time. The resulting system constitutes 
a BVP. By construction, the solution of this EVP is 
optimal with respect to the log-likelihood function and 
it can be obtained by standard numerical BVP solvers. 
The initialization of the BVP solver allows for an efficient 
transfer of information provided by the observation data. 

We consider a dynamic system defined by ordinary dif- 
ferential equations (ODE), 



df 



f{x,p), x(0) = xo. 



(1) 



with time i, states x e R" and parameters p E WC . The 
extension of the system by -^p = transforms the pa- 
rameters into usual state variables. For the augmented 
states ^ = {x,p), parameter estimation becomes an es- 
timation of initial conditions. Furthermore, let Xohs = 
(xi, . . . , Xm), with m < n, be the observed states and let 
{x^{tj), . . . ,x^{tj)}j denote the time-discrete observa- 
tion data. The observation data can be approximated by 
a continuous data function x^^^{t) = (a;f (t), . . . ,x^{t)), 
e.g. by linear interpolation or spline interpolation. On 
the other hand, we assume that measurement events for 
different time points are statistically independent, conse- 
quently, the likelihood function 
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factorizes and the negative log-likelihood 



(2) 



^(eo|{a;o'?,s(i.)b) - E - ^ogL,{^o,x^^,{t,)) (3) 



T 



r{^o,t)dt 



(4) 



can be approximated by the integral. Here, 
r(^o,t) denotes the continuous approximation of 
— logLj(^o,2;^g(tj)). For standard normally distributed 

noise, ^(^o,^ becomes {xohs{t) ~ which will be 

used in the following. The argumentation also holds for 
other noise distributions. 

An initial condition vector = (^^oiPo) G R"^'' is a lo- 
cal optimum if \/£{£,Q,t = T) = vanishes at the latest 
observed time point T. Since ^(^O)^ = 0) = for all val- 
ues of ^0 at initial time, the gradient \7£{^o,t = 0) = 
vanishes, too. This observation constitutes the boundary 
condition that needs to be matched for a local optimum. 



V^(^o,0) = V^(eo,T) =0. 



(5) 



Each line of eq. ([s]) has the potential to determine one 
parameter value. In order to include this condition into 
the dynamic system ([!]), is derived with respect to 
time. At this point it is crucial having approximated the 
negative log-likelihood by an integral expression: 



/ Vr(Co,T)dT 
Jo 



Vr(eo,t) 



(a;obs(t) - x^^^{t))*D^„Xohs{t). 



(6) 
(7) 
(8) 



Here, " * " indicates the transpose and T)^gXohs{t) de- 
notes the Jacobian of XobsC^) with respect to the initial 
conditions also known as the sensitivities of the solu- 
tion trajectory Xobsit)- The sensitivities are determined 
by an ODE, too, hence the complete systems reads 



d 2 , 

-m^-{x.b.~ 



obs ■ 



(9) 
(10) 

(11) 



The sensitivity equations ([lOj) have fixed initial 
conditions, diag(l„+,.), with the identity matrix 
In+r e ]£i{n+r)x{n+r) _ rj,^^ gradient equations ([TT| 
have both zero initial and final condition, see eq. ([5j), 
a boundary constraint that fully determines the initial 
values of the augmented states in eq. On the other 
hand, these are the parameters and initial conditions we 
seek to estimate. Hence, the desired values optimizing 
the negative log-likelihood are part of the solution of the 
two-point boundary value problem, eqs. (9pT). This is 
the principle behind our optimization approach. 
The solution of this two-point boundary value problem 
can be numerically obtained by different solver imple- 
mentations, see [13j for a survey. Compared to gradient 
based single-shooting methods, eq. (11) represents the 
pivotal difference. It translates optimization into the 
ambit of integration. 

In the following, we examine the Lotka-Volterra equa- 
tions [TIHT^ . They give a basic description of the preda- 
tor and prey population dynamics. The system is defined 
by two differential equations 



d 
dt 
d 

dt 



B = -B(7-M), 



(12) 
(13) 



where A and B correspond to prey and predator, 
respectively. The parameters a and P describe prey 
reproduction and reduction, 7 and 5 describe preda- 
tor extinction and reproduction. For non-zero initial 
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condition, the solution of eqs. ( 12p3 1 is a sustained 
oscillation. 

In the first part of the study, it is assumed that both 
populations are observed. Observation data is simulated 
by numerically integrating the ODE system with Aq — 1, 
Bo = 1.2, a = 0.45, /3 = 0.5, 7 = 0.3, S = 0.1 and 
adding Gaussian noise with a = 0.15 to the solution. 
Subsequently, the parameter values are sought to be 
recovered from randomly chosen initial parameter 
guesses. The initialization of the BVP solver, i.e. the 
grid of initially assumed values for each of the variables 
in eqs. (9pT), is obtained by integrating the sensitivity 
equations ( 10 1 with the initial parameter guess and the 
data interpolations as input trajectories. The gradient 
is assumed to vanish over the entire range. 
Independently of the initial parameter values, the BVP 
solver converges to the same solution. The result for 
one representative data set is shown in Figure [l] The 
States panel shows the solutions of the state variables A 
and B together with the data points and error bars. As 
expected, the predator and prey trajectories hit about 
67% of the error bars. In the Parameters panel, the 
solutions of the state variables a, /3, 7 and S are shown 
on a logarithmic scale, i.e. the dynamic parameters. 
The dots indicate the values that have been used for 
simulation. The Sensitivities panel shows the sensitivity 
trajectories which are typical for oscillating systems, 
i.e. oscillations with increasing amplitude. Finally, in 
the Negative log-likelihood gradient panel, the gradient 
solution is plotted. The time scale of gradient changes 
is determined by the sampling density of the simulated 
time course. It hits the ground line in the end point as 
desired, guaranteeing an optimum. 

The BVP method has been tested systematically 
against a single-shooting approach based on the 
Levenberg-Marquardt algorithm. For different simulated 
data sets, both, BVP method and single-shooting method 
have been applied to the same random set of initial pa- 
rameter guesses. In order to avoid that, by chance, pa- 
rameter vectors are too similar, we employed Latin hy- 
percube sampling with a hypercube covering 4 orders 
of magnitude around the true parameter values. Both 
optimization approaches failed convergence a number of 
times in which case 10^ was assigned as value to the 
negative log-likelihood. Figure [5] shows the first 200 
sorted negative log-likelihood values of the total 300 ini- 
tial guesses for both approaches. 

The single-shooting approach gets stuck in different lo- 
cal optima and finds the global optimum only in 4% of 
the cases. In contrast, the BVP method proves to be 
robust against different initial guesses for the parameter 
values. The solution converges either to the global min- 
imum or it fails convergence. It is 6 times more efficient 
in finding the best optimum. Figure [2] also gives some 
indication about parameter convergence regions. For the 



BVP method, the set of initial parameters that finally 
converged to the best parameter value covers almost the 
total range. However, fewer initial guesses with a and S 
larger than 1 lead to a successful reconstruction of the 
BVP solution. The broad plateau of local optima for the 
single-shooting method is reflected in a clear shift of fi- 
nal parameter values and a certain number of randomly 
distributed final parameters. 
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FIG. 1: Solution generated by the BVP solver. The 4 pan- 
els show state solutions with simulated data points, parameter 
solutions on a logarithmic scale with true values as dots, sensi- 
tivity solutions and the gradient of the negative log-likelihood. 
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FIG. 2: Comparison of single-shooting and BVP method 
tested on the fully observed Lotka-Volterra system. Each 
method has been applied to simulated data sets and for each 
data set, initial parameter vectors have been generated by 
Latin hypercube sampling covering a range of 4 orders of 
magnitude around the true parameter values. The result- 
ing negative log-likelihood values were sorted, normalized by 
the smallest value and plotted on a logarithmic scale. In the 
scatter plots, initial parameter vectors are plotted against fi- 
nal parameter values for each optimization resulting in a neg- 
ative log-likelihood value lower than 10^. 



In a second step, the observation of B is omitted and /3 
is fixed to 1 in order to keep the system identifiable. 
Analogously to the fully observed system, data sets have 
been simulated and Gaussian noise has been added. The 
comparison between the single-shooting method and the 
BVP method is shown in Figure [3j The plots indicate 
that the situation becomes more intricate if only one 
state is observed. The convergence rate drops below 2% 
for both approaches and the BVP method reconstructs 
a variety of local optima, each of them with an almost 
identical negative log-likelihood value. From the scat- 
ter plots in Figure |3j two conclusions can be drawn for 
the BVP method: First, each of the globally convergent 



solutions started from the negative orthant and second, 
there is a submanifold with boundary in the parameter 
space corresponding to the same locally optimal negative 
log-likelihood value. 
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FIG. 3: Comparison of single-shooting and BVP method 
tested on the partially observed Lotka-Volterra system. Each 
method has been applied to simulated data sets and for each 
data set, initial parameter vectors have been generated by 
Latin hypercube sampling covering a range of 4 orders of 
magnitude around the true parameter values. The result- 
ing negative log-likelihood values were sorted, normalized by 
the smallest value and plotted on a logarithmic scale. In the 
scatter plots, initial parameter vectors are plotted against fi- 
nal parameter values for each optimization resulting in a neg- 
ative log-likelihood value lower than 10'^. 



Figure [4] shows the same picture for initial guesses 
starting from the negative orthant only. In agreement 
with the expectation, the number of BVP solutions cor- 
responding to the global optimum increases consider- 
ably and exceeds the success rate of the single-shooting 
method by a factor of 5. 
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FIG. 4: Comparison of single-shooting and BVP method 
tested on the partially observed Lotka-Volterra system. Each 
method has been applied to simulated data sets and for each 
data set, initial parameter vectors have been generated by 
Latin hypercube sampling covering the negative orthant of 
parameter space. The resulting negative log-likelihood val- 
ues were sorted, normalized by the smallest value and plotted 
on a logarithmic scale. In the scatter plots, initial parame- 
ter vectors are plotted against final parameter values for each 
optimization resulting in a negative log-likelihood value lower 
than 10^. 



Finally, we come to the following conclusions: In case 
of a partially observed system, the success rate of the 
BVP method depends on favorable initial conditions. 
Whereas the single-shooting algorithm performed equally 
badly over the entire parameter space, for the boundary 
value approach, it was possible to identify an attractive 
basin resulting in a considerably increased convergence 
rate. 

From the fully observed Lotka-Volterra system, we con- 
clude that the boundary value approach is excellently 
suited as optimization approach if numerous observables 
are available. In this case, it clearly outperforms the 
single-shooting Levenberg-Marquardt algorithm in terms 
of convergence to the global optimum. The strength of 



the presented optimization approach is its ability to ex- 
ploit the measured time courses in a natural way. This 
favors convergence to the global optimum. Unlike single- 
shooting approaches, convergence to local optima or false 
convergence claims are efficiently reduced. 
The key to this advantage is the reformulation of the es- 
timation problem as a boundary value problem which, 
in turn, is enabled by continuation of the negative 
log-likelihood function to a time-differentiable function. 
This restatement elegantly incorporates optimization and 
ODE solution in one task. By nature of the boundary 
value problem, an initial guess for all state variables needs 
to be presented to the numerical solver. The initialization 
by measured time-courses carries exactly the information 
that is necessary to make the algorithm converge to the 
best optimum. 
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